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A systolic array is a natural architecture for the 
implementation of a Reed-Solomon (RS) encoder and decoder. 
It possesses many of the properties desired for a special- 
purpose application: simple and regular design, concurrency, 

modular expansibility, fast response time, cost-effective- 
ness, and high reliability. As a result, it is very well 
suited for the simple and regular design essential for VLSI 
implementation . 

This thesis takes a modular approach to the design of a 
systolic array based RS encoder and decoder. Initially, the 
concept of systolic arrays is discussed followed by an 
introduction to finite field theory and Reed-Solomon codes. 
Then it is shown how RS codes can be encoded and decoded with 
primitive shift registers and implemented using a systolic 
architecture. In this way, the reader can gain valuable 
insight and comprehension into how these entities are 
coalesced together to produce the overall implementation. 
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I 



INTRODUCTION 



In this very volatile and technological age, it is 
imperative that communica tion links and computer memories 
transmit information reliably and quickly. However, in many 
cases this is virtually impossible because noise causes the 
received data to differ significantly from the original 
data. In order to rectify this situation error-correcting 
codes have been developed to enable a system to continually 
maintain a high degree of reliability despite the presence 
of noise. To accomplish the error correction, in addition 
to the data or information bits that are transmitted, some 
additional redundant-check bits or parity bits are also 
transmitted. In this way, although the noise may introduce 



some errors 


in either the transmitted data 


bits 


or 


the 


transmitted 


check bits, there 


are 


usually 


still 


enough 


uncorrupted 


bits available to 


the 


receiver 


to 


allow 


a 



sophisticated decoder to correct the errors. In fact, only 
a modest amount of redundancy is actually needed to ensure 
that the probability of the decoding error is negligibly 
small . [ Ref . 1 ] 

Nonetheless, unlike the encoders and decoders of the 
1950's and 1960's which were constrained by digital hardware 
costs and virtually nonexistent chip technology, today's 
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encoders and decoders coupled with significant improvements 
in their associated algorithms have become, and will con- 
tinue to be, increasingly attractive from an economic 
viewpoint. 

One such class of error-correcting codes which is very 
popular in the communication circles and is paramount in 
this author's discussion of systolic array encoders and 
decoders are the Reed- Solomon (RS) codes. These codes can 
correct both random and burst errors over a communication 
channel, and as such are ideal for the very low error pro- 
babilities needed for reliable space communications. Still, 
the RS codes are only as effective as the complexity of the 
encoder that produces them and the decoder by which errors 
are corrected. The encoder complexity is directly propor- 
tional to the error- correcting capability of the code, the 
speed of the encoding process, and the interleaving level 
used, i.e., the number of original codewords which are 
multiplexed together to increase the immunity of codes to 
burst errors [Ref. 1]. In fact, for truly reliable space 
communications there is a bonafide need to use RS codes with 
a large error- correcting capability and an equally large 
interleaving level. As a result, one is especially 
interested in decreasing or minimizing the complexity of an 
RS encoder while simultaneously ensuring maximum performance 
and high reliability. Clearly, what is needed for this type 
of application is a special-purpose system which compliments 
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the forementioned attributes. Therefore, a systolic array 
is a natural architecture for the simple, regular, and cost- 
effective implementation of an RS encoder and decoder. 

In an effort to assist the reader in simplicity and 
comprehension, this author has taken the pertinent informa- 
tion vital to the thesis and created a chapter for each. 
After systolic arrays are introduced in Chapter II the 
necessary fundamentals of finite fields for an understanding 
of Reed-Solomom codes is discussed in Chapters III and IV. 
In Chapter V a systolic array multiplier for finite fields 
is discussed and finally in Chapter VI the encoder and 
decoder for binary codes is described as well as the encoder 
and decoder for RS codes. 
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II. SYSTOLIC ARRAYS 



A. BACKGROUND 

It is clear today that developments in microelectronics 
have made a revolutionary impact on computer design 
[Ref. 2]. For example, integrated circuit technology has 
made a significant increase in the number and complexity of 
components that can now fit on a chip or a printed circuit 
board. In fact, with the component density presently 
doubling every one- to- two years, the notion of the million- 
transistor chip will soon be a reality [Ref. 3]. Commen- 
surate with this major increase in chip density is the 
utilization of highly parallel computing structures which, 
almost by definition, implies a basic computational element 
repeated hundreds or thousands of times. This architectural 
style, which has structural properties suitable for VLSI 
implementation, reduces the design problem by several orders 
of magnitude. As a result, we are interested in high- 
performance parallel structures that can be implemented 
directly via very economical hardware devices [Ref. 2]. In 
other words, cost-effectiveness has always been, and will 
continue to be, a major concern in designing special-purpose 
VLSI systems; their cost must be low enough to justify their 
limited applicability. Furthermore, if a structure can 
truly be decomposed into a few types of building blocks 



13 



which are used repetitively with simple interfaces, tremen- 
dous savings can be achieved. 

This is especially true for VLSI designs where a single 
chip usually comprises hundreds of thousands of identical 
components. Clearly, in order to overcome this 
complexity, simple and regular designs are essential. In 
fact, VLSI systems which are based on simple, regular lay- 
outs are very likely to be modular and therefore adjustable 
to various performance levels. Still, with the technological 
indication of a diminishing growth rate for component speed, 
any major improvement in computation speed must come from 
the concurrent use of many processing elements. [Ref. 3] 

The degree of concurrency in a VLSI computing structure 
is largely determined by the underlying algorithm. 
Consequently, massive parallelism can be achieved if the 
algorithm is designed to exploit high degrees of pipelining 
and multiprocessing. For instance, when a large number of 
processing elements work simultaneously, coordination and 
communication become significant — especially with VLSI tech- 
nology where routing costs dominate the power, time, and 
area required to implement a computation. Thus, the 
requirement is to design algorithms that support high 
degrees of concurrency, and at the same time to employ only 
simple, regular communication and control to ensure effi- 
cient implementation. [Ref. 4] 
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Clearly, what is required is a special-purpose design 
which employs simple and regular communication paths for 
multiprocessor structures in addition to pipelining as a 
general method for utilizing these structures. In short, 
systolic arrays provide a realistic model of computation 
which captures these concepts of pipelining, parallelism, 
and interconnection structures. 

According to Kung and Leiserson [Ref. 2]: 

A systolic array is a collection of relatively simple 
processing units, usually of the same type, which are 
connected together by a simple communication network and 
that operate in parallel, as depicted in Figure 2.1. 
The performance advantage of a systolic array architec- 
ture is that it uses each datum retrieved from memory 
numerous times without having to store and retrieve 
intermediate results, thus allowing significant speedups 
relative to the memory bandwidth. Thus, a systolic 
system is a network of processors which rhythmically 
computes and passes data through the system. The 
analogy is to the rhythmic contraction of the heart 
which pulses blood through the circulatory system of the 
body. Each processor in a systolic network can be 
thought of as an element through which multiple streams 
of data are pumped. The regular beating of these 
parallel processors maintains a constant flow of data 
throughout the entire network. As data items are pumped 
through the network some constant-time computation is 
performed and, depending on the operation, updates of 
some of the items may occur. However, unlike the 
closed-loop circulatory system of the body, a systolic 
computing system usually has ports into which inputs 
flow, and ports from which the results of the computa- 
tion are received. Thus, a systolic system can be 
viewed as a pipelined system — one in which input and 
output occur with every pulsation. 

As a result, this makes it extremely attractive for a 
wide class of compute-bound computations where multiple 
operations are performed on each data item in a repetitive 
manner . 
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(a). ONE -DIMENSIONAL LINEAR ARRAY 




(d) TWO-DIMENSIONAL HEXAGONAL ARRAY 
Figure 2.1 Various Systolic Array Configurations 
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B. PRINCIPLE OF OPERATION 



The basic principle of a systolic array is illustrated 

in Figure 2.2. As stated earlier, by replacing a single 

processing element (PE) with an array of processing 

elements, a higher computation throughput can be achieved 

without increasing the memory bandwidth. 

Suppose each processing element in Figure 2.2 operates 
with a clock period of 100 nanoseconds (ns). The con- 
ventional memory-processor organization in Figure 2.2a 
has at most a performance of 5 million operations per 
second (MOPS). With the same clock rate, the systolic 
array processor will result in 30 MOPS performance. 
This gain in processing speed can also be justified with 
the fact that the number of pipeline stages has been 
increased six times in Figure 2.2b. Being able to use 
each input data item a number of times is just one of 
the many advantages of the systolic approach. Other 
advantages include modular expansibility, simple and 
regular data and control flows, use of simple and 
uniform cells, elimination of global broadcasting, 
limited fan-in and fast response time. [Ref. 3] 

With the above criteria a systolic array is a natural 
architecture for the implementation of an RS encoder and 
decoder which will become apparent after our introduction of 
Reed-Solomon codes in Chapter IV. 
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(b) A SYSTOLIC PROCESSOR ARRAY 



Figure 2.2 The Concept of a Systolic Processor Array 
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FINITE FIELD THEORY 



A. BACKGROUND 

Finite or Galois fields (named after the nineteenth 

/ 

century French mathematician Evariste Galois) play many 
important and diverse roles in numerous applications ranging 
from digital signal processing to switching theory. How- 
ever, in this thesis we are concerned with their use in the 
construction of Reed- Solomon error-correcting codes. We 
begin with a general analysis of the pertinent facts 
regarding finite fields. In the next chapter the necessary 
facts about Reed- Solomon codes are discussed. 

A field is a set of elements, including 0 and 1, any 
pair of which may be added or multiplied (denoted by + and 
*, respectively) to give a unique result in the field. The 
addition and multiplication are associative and commutative, 
and multiplication distributes over addition in the usual 
way: u* ( v+w )=u*v+u*w. Every field element u has a unique 

negative -u such that u+(-u)=0. Every nonzero field element 
u has a unique reciprocal field element 1/u, such that 
u*(l/u)=l. For every field element u, 0+u=u=l*u, and 0 *u=0 . 
Thus the numbers 0 and 1 are the additive and multiplicative 
identities, respectively. [Ref. 5] 

The order of a field is the number of elements in the 
field. If the order is infinite, we call the field an 
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infinite field. The rational numbers, the real numbers, and 
the complex numbers are all examples of infinite fields. If 
the number of elements is finite we call the field a finite 
field [Ref. 5] . 

For any prime p and any positive integer m a Galois 
field denoted GF(p m ) or GF(q) exists. We can construct a 
field containing p 111 elements as an algebra of polynomials 
modulo an irreducible polynomial over GF(p) of degree m. 
Addition is bit-by-bit modulo p addition. 

The multiplicative group of the nonzero field elements 
is cyclic, i.e., it is a group that consists of all the 
powers of one of its elements, g. Multiplication is defined 
as gi*gj = gi + j where i+j is computed modulo (p 111 -!) and g is a 
generator of this group. A generator of this multiplicative 
group, called a primitive element , is a root of an irre- 
ducible polynomial over the prime field GF(p). This 
irreducible polynomial, called a primitive polynomial , is 
the minimal polynomial of the primitive element, i.e., the 
polynomial of least degree with the primitive element g as a 
root. Generally speaking, an irreducible polynomial is 
analogous to a prime number: it has no nontrivial factors. 

Lastly, the Galois fields that can be created by taking 
residue or equivalence classes of polynomials modulo an 
irreducible polynomial over GF(p) are said to be fields of 
characteristic p. Thus, GF ( pro ) is a field of characteristic 
p for each choice of positive integer m [Ref. 6] . 
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B. AN EXAMPLE OF THE CREATION OF A FIELD 

Consider the Galois field GF(2^). It has 2 4 elements 
and may be constructed as the field of polynomials over 
GF(2) modulo the irreducible polynomial l+x+x^ . if we let g 
represent a root of l+x+x^, then it is also a primitive 
element of the field. Field addition of the elements is 
bit-by-bit modulo 2 addition while multiplication of the 
elements is described using the primitivity of the element 
0 . Thus, gi*ej = gi + j where i+j is reduced modulo 15, if 
necessary. For example, given the field elements pi 3 and 
p9 , two of the 15 nonzero field elements listed in 
Table I, we can easily demonstrate both operations: 
$13+ $9 = ( $3+ $ 2+1 )+ ( $3+ $ )=2 p3+ $2+ g+i = g2+ $+ 1 = $10 while $13*39 = 
gl 3 +9 = 322 = g22-15 = g7 = 33 + 3+ 1 . 



21 



TABLE I 



REPRESENTATION OF GF(2 4 ) 



FIELD 

ELEMENT 



BETA POLYNOMIAL 



4 -TUPLE 



3°=1 


1 


6 1 =3 


B 


6 2 =3 2 




6 3 =B 3 




b 4 =b+i 


1 + B 


3 5 =B(B 4 ) 


B + 


6 5 

B =3(3 ) 




3 7 =B(B 6 ) 


1 + B 


B 8 =B(B 7 ) 


1 + 


B 9 =B(B 8 ) 


B 


B 10 =B(B 9 ) 


1 + 6 + 


B 11 =6(B 10 ) 


3- + 


6 12 =3 ( 3 11 ) 


1 + 6 + 


B 13 =B(B 12 ) 


1 + 


B 14 =B(B 13 ) 


1 









1 


0 


0 


0 








0 


1 


0 


0 


e 2 






0 


0 


1 


0 






B 3 


0 


0 . 


0 


1 








1 


1 


0 


0 


B 2 






0 


1 


1 


0 


B 2 


4 * 


B 3 


0 


0 


1 


1 




+ 


B 3 


1 


1 


0 


1 


B 2 






1 


0 


1 


0 




4 * 


B 3 


0 


1 


0 


1 


B 2 






1 


1 


1 


0 


B 2 


+ 


B 3 


0 


1 


1 


1 


B 2 


+ 


B 3 


1 


1 


1 


1 


B 2 


+ 


B 3 


1 


0 


1 


1 




+ 


B 3 


1 


0 


0 


1 
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IV. REED- SOLOMON CODES 



A. BACKGROUND 

Reed- Solomon (RS) codes are Bose-Chaudhuri-Hocquenghem 
(BCH) codes over GF(q) of length q-1 . They are error- 
correcting codes which are used in many special-purpose 
applications ranging from deep-space communications and 
spread spectrum to digital audio disk systems and secure 
data transmissions [Ref. 7] . These codes can correct both 
random and burst errors over a communication channel and 
hence are ideal for the numerous, real-time, and reliable 
applications demanded by these applications. The complexity 
of RS encoders and decoders is proportional to the error- 
correcting capability of the code, the speed of the decod- 
ing, and the interleaving depth used [Ref. 8]. For truly 
reliable communications there is a very strong tendency to 
use RS codes with a large error-correcting capability and an 
equally large interleaving level. Hence, one is especially 
interested in minimizing the complexity of RS encoders and 
decoders for communications and other pertinent applica- 
tions. Toward this end, there is a considerable interest in 
systolic array construction and eventual VLSI implementation 
of RS encoders and decoders which yield significant savings 
in size, weight, and power consumption while simultaneously 
providing high reliability. 
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In this chapter we look at a generic construction and 
architecture of an RS encoder developed by Johl [Ref. 9] 
and use this design as a foundation for the subsequent 
discussion and implementation in the later chapters. This 
implementation utilizes a systolic architecture of identical 
cells arranged in a linear array, .each executing a finite- 
field multiplication and addition in a pipelined manner; 
thereby, significantly increasing the throughput rate. 
Also, since the layout of the cell need only be done once 
and then replicated, it is extremely attractive for eventual 
VLSI implementation. 

B. GENERIC ARCHITECTURE 

The RS code is a block code which consists of symbols of 
more than one bit. When each symbol is J-bits wide, an RS 
codeword has (2 J -1) symbols. As depicted in Figure 4.1, an 
RS code can be designed to be capable of correcting E errors 
with each codeword consisting of I information symbols, 
together with 2E parity or check symbols. As an example, 
given the irreducible polynomial l+p+e 4 =0 and its corre- 
sponding finite field as described in Table I we are able to 
establish an important foundation vital to the development 
of a generic RS encoder. This RS code consists of a total of 
15-four bit symbols for each codeword. If this particular 
code should correct one error, it would need two parity 
symbols and therefore would contain thirteen information 
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Figure 4.1 A Reed-Solomon Codeword 



symbols. This representation is known as an RS (15,13) 
code, where the first integer depicts the total number of 
symbols in the codeword, and the second integer indicates 
the number of information symbols. It is the responsibility 
of the encoder to use the information symbols to generate 
the check or parity symbols for the codeword. The informa- 
tion symbols are treated as coefficients of a polynomial 
f ( x) , 

2 J-1-2 E 

. \ - 2 J -l-i 

i=l 

where f^ is the i fc h transmitted information symbol. The 
corresponding generator polynomial is known as g(x). 



g( x) 



2E 



(x+pi) 

i = l 



Then, the 2E parity symbols are defined as the coefficients 
of the remainder of f(x)/g(x). Therefore, in the RS (15,13) 
code previously mentioned 



g ( x ) = 



2 



(x+gi ) 

i=l 



= (x+g 1 )(x+g 2 ) 

= x 2 + ( gl+ g 2 ) x+ g 2 
= x 2 +g^x+g 2 
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Furthermore, let us assume that the thirteen information 



symbols are g 6 , gl , g 2 , g 2 , g 4 g 2 , g 22 , g? , g P , gH, g 14 , 
g 2 , gl 2 . Then, 



f ( x ) = 



13 




i=l 



f .x 
1 



15- i 



= fixl 4 + f2X 22 + f3X 22 f4X 22 +f5 xl^ + fgx^ + f^x^+fsx^+fg x^ + f^Q x 2 
+ f 1L x 4 + fl2 x ^ + ^ 13 x ^ 

= g 2 x 24 + g 2 x 22 + g 2 x 22 + g 2 xll+ g 4 x 2 0+ g 2 x9 + gl2 x 8+ g7 x 7 + g9 x^+ gH x 2 
+ g 24 x 4 + g 2 x 2 + g 22 x 2 

Performing the required division, f(x)/g(x) 



2 5 3 

x + 0 x+ g 



6 12 , „6 110 H> „14 2 9 

gx + g x +gx +...+ g x+gx+0 

6 14 1 13. „8 12 14 4 3 3 . „13 2 

gx +gx +gx + . . . + g x+gx+g X 

6 14 11 13 , 9 12 

gx +g X + g X 

6 13 . 12 12 2 11 

gx +g X + g X 

6 13..11 12 , „9 11 

gx + g X + g X 

0 12 11 11 4 10 

gx +g X + g X 

0 12 , .5 11 . 3 10 
gx + g X +gx 



14 4 14 

g x + g X 

14 4 4 3 

g X + g X 


3 13 2 

+ g X 

,2 2 
+ g X 


9 3 


14 2 „ 


g X 


+ g x +0 x 


9 3 


. 14 2, 12 


g X 


+ g x + g X 




12 




g X 




0 x 



g^ ^ x+0 
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Hence, the remainder we seek is , o and thus the corre- 

sponding 15- symbol codeword is 0 ® gl ^8 g 2 ^4 g5 gl 2 ^7 g9 gll ^14 g3 
gl3 gl 2 o where the first thirteen symbols represent the 
information symbols and the last two symbols represent the 
parity symbols. [Ref. 9] 

The architecture of the systolic implementation consists 
of a regular array of identical cells. Division is per- 
formed in a pipelined manner by simultaneously entering the 
highest order of terms of the f(x) and g(x) polynomials on 
the left most cell and generating the appropriate codeword 
on the far right, as depicted in Figure 4.2. In fact, a 
codeword can immediately follow the previous one without any 
interruption in the pipeline flow. Likewise, the control is 
also systolic. One control bit pipeline path will signal 
the start of a new codeword; another will signal the start 
of the division operation. Meanwhile, each cell of the 
array will hold one term of the quotient. As a result, if d 
represents the difference in degrees between two poly- 
nomials, then 

d=[deg f(x)-deg g(x)] 

and thus d+1 cells are required. For example, 

deg f(x) = 14 

deg g(x) = 2 

d = 12 (degree of quotient) 
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Figure 4.2 A Systolic Architecture 



From our previous calculation, the quotient was ( 3^x12+ g6 x ll 
+ 3 O x!0+ . . .+ 314 x 2 + 39 x+0 ) . Since it consists of thirteen 
terms, thirteen cells would be needed. In general, 

deg f(x) = 2 J -2 

deg g(x) = 2E 
d = 2 J -2E-2 

and so the total number of cells required is d+1 or 2 J -2E-1. 
[Ref. 9] 

The operation of each cell is simple and regular. 
Essentially, it accomplishes one line of the normal division 
by initially determining the specific term of the quotient, 
multiplying by the divisor, subtracting the result from the 
dividend, and finally passing along the divisor and partial 
result to the next cell. More specifically, there are three 
J-bit data paths and two 1-bit control paths, as shown in 
Figure 4.3. The function of the C data path is to allow the 
information symbols to pass through the array unchanged 
while the other two data paths, A and B, are for the 
dividend and divisor, respectively. The register Q is set 
at the start of the division, and remains the same through- 
out the polynomial division of one block. The register B is 
used as a temporary storage device. While a control bit 
accompanies the first byte of information to signal the 
start of a new codeword a preceding start bit, one- half the 
rate of the control bit, initiates the division operation in 
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Figure 4.3 The Systolic Cell Structure 
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each cell. In short, the above architecture is simply a 
pipelined parallel processor which is composed of a systolic 
array of identical cells, each performing a finite-field 
multiplication and addition. Since the layout is simple and 
regular, it is easily replicated and economical to produce. 
[Ref. 9] 

In Chapter VI the encoder and decoder for an RS code are 
described in greater detail with the encoding and decoding 
process carried out for a specific example. 
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V. IMPLEMENTATION THEORY 



A. BACKGROUND 

In this chapter we look at the theoretical concepts 
behind the systolic implementation of an encoder and 
decoder. We then apply these concepts to the actual imple- 
mentation in the subsequent chapter. There, the binary case 
is initially presented because of its simple architecture 
and ease of understanding. It is then followed by the more 
intricate and complex Reed- Solomon case. 

We also, in this chapter, discuss in-depth the design of 
a systolic array multiplier used in the RS encoder. Unlike 
the binary case which deals only with the elements 0 and 1 
in the complete codeword, the Reed-Solomon codeword will 
contain symbols which lie in a larger field than GF(2). As 
a result, the systolic array multiplier is increasingly more 
detailed and complicated than in the binary design which 
simply uses a primitive binary shift register scheme. 

B. PRIMITIVE BINARY SHIFT REGISTER DESIGN 

A primitive binary shift register is a series of regis- 
ters each capable of containing a zero or a one. The 
contents of the register all shift on a designated time 
signal via use of an external clock. The contents of the 
newest stage of the register is defined as a function of the 



33 



current contents of the register. Because these shift 
registers utilize this feedback property they are commonly 
referred to as feedback shift reg ister s or pr imitive shift 
reg ister s since the feedback is usually described by a 
primitive polynomial [Ref. 10]. For example, the diagram in 
Figure 5.1 describes a primitive shift register composed of 
four registers, labeled 1, x, x 2 , x 2 and one modulo 2 adder 
situated between registers 1 and x. Each register is 
capable of storing one bit of binary information, i.e., a 
"1" or a ”0". The all zero contents of the register is 
typically prohibited. This restriction is placed on the 
primitive shift register to ensure a change of state when a 
new clock signal is received. The register is allowed to 
step from state to state, therefore the length of a primi- 
tive cycle is independent of its initial state and is equal 
to 2 1TU 1. The primitive shift register of Figure 5.1 will 
move through 15 distinct binary patterns before repeating 
(see Table II). This primitive shift register is said to 
have a cycle length of 2^-1 or 15. Moreover, since all 
nonzero patterns are included in the cycle, it is called a 
max imum- leng th cycle. In general, a primitive shift regis- 
ter composed of m stages will generate a maximum- length 
cycle of period 2 m -l . It is possible for each value of m to 
determine a primitive feedback function for the shift 
register so that a max imum- leng th shift register sequence of 
period 2 m -l is generated. 
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Figure 5.1 A 4-Stage Primitive Shift Register 



TABLE II 



REGISTER CONTENTS AFTER SUCCESSIVE CLOCK SIGNALS 
TIME (t) REGISTER CONTENTS BETA POLYNOMIAL 



t 

t 

t 

t 

t 

t 

t 

t 

t 

t 

t 

t 

t 

t 

t 

t 



0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 



10 0 0 
0 10 0 
0 0 10 
0 0 0 1 
110 0 
0 110 
0 0 11 
110 1 
10 10 
0 10 1 
1110 
0 111 
1111 
10 11 
10 0 1 
10 0 0 



1 

8 



,10 



jll 



*1 2 



r 13 



r 14 



r 15 
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Maximum-leng th cycles and max imum- leng th sequences have 
broad applications in data communication systems and com- 
puter simulation while primitive shift registers designed as 
division circuits have applications in coding theory 
[Ref. 10]. It is the objective of this chapter to utilize 
the concepts of the latter to propose an RS encoder and 
decoder . 

In order to generate a maximum- leng th cycle or sequence 
we need to understand the necessary component connections 
given a primitive polynomial. That is, given an arbitrary 
primitive polynomial, how do we design the shift register? 
For the example of Figure 5.1, assume p(x)=l+x+x 4 is a 
primitive polynomial over GF(2). We can consider GF(2 4 ) as 
an algebra of polynomials modulo p(x)=l+x+x 4 and design a 
register to produce a pattern cycle of length 2 4 -l. Using 

four delay units (since we need a register unit for the 

coefficient of each term x fc with 0<t<3) we need only decide 
how the primitive polynomial affects the feedback to know 
where to place the modulo 2 adder components and where to 
make the necessary circuit connections. The feedback is the 
coefficient of x 4 , but in this polynomial algebra x 4 =l+x. 
Thus, the feedback goes to the registers which contain the 
coefficients of the x0 and x^- terms. Making these connec- 
tions and supplying the modulo 2 adder component where we 
have two inputs to the register, we arrive at the shift 

register given in Figure 5.1. Then each step of the 
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register is equivalent to multiplying the contents of the 
register by the primitive element 0. Thus, the sequence of 
contents are the powers of 0 modulo (1+0+04). In this way 
multiplication of the elements of the field is produced 
simply as described in Chapter III and the powers of 0 are 
as given as in Table I of Chapter III. 

C. CODING THEORY 

Suppose that we wish to transmit a sequence of binary 
digits across a noisy channel. If we send a one a one will 
probably be received and if we send a zero a zero will also 
probably be received. Occasionally, the channel noise will 
cause a transmitted one to be received as a zero or a trans- 
mitted zero to be received as a one. Although we are unable 
to prevent the channel from generating such errors, we can 
reduce their undesirable effects with the use of coding 
[Ref. 5J. The basic idea is simple. A set of k message 
digits which we wish to transmit is concatenated to r check 
digits. The entire block of n=k+r channel digits then forms 
the transmitted codeword. Assuming that the channel noise 
changes sufficiently few of these n transmitted channel 
digits, the redundancy afforded by r check digits provides 
the receiver with sufficient information to detect and 
correct the channel errors. Figure 5.2 illustrates the 
basic idea of the encoding process for an (n,k) encoder with 
n=15 and k=ll. The codeword is constructed in such a way 
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Figure 5.2 The Encoding Process 



that the message digits appear at the far right. The error 
correcting capability of the generated code depends upon the 
number of check bits added. To illustrate, the binary code 
constructed using the encoder of Figure 5 .2 is capable of 
correcting one error when, for example, n= 2 rn -l , k= 2 m -l-m for 
each integer m > 2, the so called Hamming single- error- 
correcting code. 

D. MINIMAL POLYNOMIALS 

In order for a code to correct every pattern of t or 
fewer channel errors, the codewords must be generated by a 
polynomial whose length is the product of at least t 
distinct minimal polynomials [Ref. 5]. Occasionally, extra 
error correcting capability is possessed by words of 

a code beyond the designed capacity ' of the code. To 

understand this situation and the general error correcting 
capacity of the code, it is necessary that we discuss 
some of the mathematical concepts and properties that 
comprise minimal polynomials before discussing the actual 
implementation . 

A minimal polynomial for a primitive element 3 over 
GF(p) is the lowest degree irreducible monic (has leading 
coefficient 1) polynomial M(x) with coefficients from GF(p) 
such that M(3)=0 [Ref. 11]. For example, the Galois field 
GF(24) i s constructed using the primitive element 3 , the 
root of the irreducible polynomial 1+x+x^ . Then the minimal 
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polynomials for the elements g, g 3 f g5 f an d g7 are given in 
the following table. 



Element 



0 

1 

0 

0 3 

05 

07 



Minimal Polynomial 
x 

1+ X 
l+x+x 4 

l+x+x 2 +x 3 + x 4 
l+x+x 2 
l+x 3 + x 4 



Furthermore, in GF(2 m ) gi and g 2 i have the same minimal 
polynomial. In general, if g 1 is a root of a minimal poly- 
nomial then so is gPi (where p is the characteristic of 
the ground field GF(p); in this case p=2 ) . To illustrate, 
let us substitute the elements g and g 2 into our minimal 
polynomial l+x+x 4 . Upon substitution of g we obtain 1+ g+ g 4 . 
Thus in GF(2 4 ) g 4 = g+l and M( g)=0 . Likewise, upon sub- 
stituting g 2 for x in the same minimal polynomial we obtain 
1+ g 2 + g 8 , which in GF(2 4 ) is also zero as can be seen in 
Table I of Chapter III. Elements of the field with the same 
minimal polynomial are called conjugates. In the same way, 
the imaginary roots i and -i are referred to as conjugate 
complex numbers — they both have the same minimal polynomial 
x 2 +l over the reals [Ref. 11] . 

From our preceding discussion, it is clear that g, g 2 , 
(g 2 ) 2 = g 4 , (g 4 ) 2 = g 8 all have the same minimal polynomial 
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l+x+x4 . Likewise g3 f g6 1 gl2 f g24=g9 also have the same 
minimal polynomial l+x+x^+x^ + x^ . We see that the powers of 

g fall into disjoint sets, called cyclotomic cosets . In 
fact, all gj which are elements of the same cyclotomic coset 
have the same minimal polynomial . The cyclotomic coset 
containing g s consists of the following powers of g: 



{s, 2s, 2 2 s , 2 3 s, 



m -i 
2 3 s} 



m 



where m s is the smallest positive integer such that 2 s = 
s(mod 2 m -l) [Ref. 11]. For example, the cyclotomic cosets 
over GF(2^ ) are : 



C 0 = (0 } 

Ci = {1,2, 4, 8} 

C 3 = {3,6,12,9} 

C 5 = {5 ,10 } 

C 7 = {7,14,13,11} 



Other cyclotomic coset decompositions for various values of 
m are listed in Table III. 

If we let M(i)(x) represent the minimal polynomial of 
gi-eGF(p m ), it follows that if i is in the cyclotomic coset 
C s , then 



M( i) (x) 



(x+gj) 

j e< “S 
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TABLE III 



CYCLOTOMIC COSETS 



OVER 


GF ( 2 3 ) 




OVER GF(2 5 ) 


co = 


{0} 


co 


= 


{0} 


Cl = 


{1,2,4} 


Cl 


= 


{1,2,4,8,16} 


C 3 = 


{3,6,5} 


C3 


= 


{3,6,12,24,17} 






C 5 


= 


{5,10,20,9,18} 






C? 


= 


{7,14,28,25,19} 






Cll 


= 


{11,22,13,26,21} 






c 15 


= 


{15,30,29,27,23} 







OVER GF ( 2 6 ) 


co 


= 


{0} 


Cl 


= 


{1,2,4,8,16,32} 


C3 


= 


{3,6,12,24,48,33} 


C 5 


= 


{5,10,20,40,17,34} 


C7 


= 


{7,14,28,56,49,35} 


C 9 


= 


{9,18,36} 


Cll 


= 


{11,22,44,25,50,37} 


Cl3 


= 


{13,26,52,41,19,38} 


c 15 


= 


{15,30,60,57,51,39} 


C 21 


= 


{21,42} 


c 23 


= 


{23,46,29,58,53,43} 


C27 


= 


{27,54,45} 


C 31 


= 


{31,62,61,59,55,47} 
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which is analogous to the generator polynomial g(x) in our 
generic architecture of the previous chapter. Moreover, by 
utilizing various techniques beyond the scope of this 
thesis, we may determine all the minimal polynomials of 
elements in GF(2^), as depicted in Table IV. Using this 
table we may construct all the Reed- Solomon codes of block 
length 15 which correct t or fewer channel errors. These 
codes have the following generator polynomials: 

t=l g(x)=M(l ) (x)=l+x+x 4 

t=2 g(x)=M(l ) (x)*M(3 ) (x)=l+x^ + x° + x"7 + x^ 

t=3 g(x)=M(l ) (x)*M(3 ) (x ) *M<5 )=1+x+x2+x 4 + x5 + x^+x 1 0 

Hence, the t-error correcting RS code of block length n is 
then the cyclic code whose generator polynomial is the 
product of the distinct minimal polynomials of g, g2 f g3 f 
..., g2t-l f g2t [Ref. 5 ]. Of noteworthy interest is the 
fact that an RS code over GF(2^) which is designed to 
correct up to 4 errors is also able to correct 5 errors. 
This is because M(9)(x), the minimal polynomial of g9 , is 
identical to m(5)(x), the minimal polynomial of g 5 . Simi- 
larly, the 6 error-correcting RS code is identical to the 7 
error-correcting code just as the 8-to-14 error correcting 
codes of length 31 are all identical to the 15 error- 
correcting code. In a similar way, codes over GF(2^) and 
GF(2"7) are sometimes able to correct more errors than they 
are designed to correct. The ability to correct these extra 
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TABLE IV 



MINIMAL POLYNOMIALS OF ELEMENTS IN GF(2 4 ) 



Md) (x) 
M(3 ) (x) 
M(5 ) ( X ) 

M(7 ) ( X ) 



M( 2 ) ( x ) = M(4 )(x) = M(8)( x ) = 1+x+x 4 
M(6)( x ) = M(12)( X ) =m(9)(x) = l+x+x 2 +x3 + x 4 
MdO ) (x) = 1+x+x 2 

M(14)(x) = M(13)( X ) = m(H)(x) = l+x 3 + x 4 
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error patterns depends upon finding higher powers of g which 
belong to cyclotomic cosets for the smaller powers of g 
which belong to the code for the designed error correcting 
distance. The tables of cyclotomic cosets for GF(2^) f 
GF ( 2^ ) show that g9 belongs to g5 , gl7 belongs to g? and g!9 
belongs to g!3 f etc. See [Ref. 11] for further discussion 
of the error correcting capabilities of given error 
correcting codes. 

E. SYSTOLIC ARRAY MULTIPLIER 

As mentioned earlier in this chapter, the systolic array 
multiplier used in the generation of Reed- Solomon codewords 
is much more complex than in the binary case. In this 
section, we discuss the design of a systolic array multi- 
plier developed by Yeh, Reed, and Truong [Ref. 7] to assist 
us in our implementation of an RS encoder. 

According to [Ref. 7] several circuits have been pro- 
posed to realize multiplication in GF(2 ra ). Unfortunately, 
these circuits are not suited for use in VLSI systems, due 
to irregular wire routing, complicated control problems, 
nonmodular structure and lack of concurrency. The systolic 
array multiplier of [Ref. 7] performs the multiplication in 
the field GF(2 m ) which overcomes some of these unwanted 
attr ibutes . 

The systolic architecture is developed for performing 
the product- sum computation, AB+C, in the finite field 
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GF(2 m ) of 2 m elements, where A, B, and C are arbitrary 
elements of the field. The multiplier is a serial-in, 

serial-out, one-dimensional systolic array which requires m 
basic cells. To perform an isolated computation the multi- 
plier requires 3m time units, however, the average time per 
computation is only m time units if a number of computations 
are carried out consecutively. Because the architecture is 
simple and regular and possesses the desirable properties of 
concurrency and modularity, it is well suited for VLSI 
implementation. [Ref. 7] 

Consider the nonzero elements of GF(2 m ). They can be 



represented 


as 


the powers of 


3, a 


primitive 


element of 


the field 


as 


d iscussed 


in Chapter 


III. 


Since F ( 3)=0 , 


& m=f m-l V™’ 1 


+ . • 


.+ fl3+f 0 / 


where 


the 


coefficients f^ are 


de termined 


by 


the polynomial 


f(x) 


which 


3 


satisfies. 


Therefore 


an 


element 


of 


GF( 2 m ) 


is 


of 


the form 


am-ie™"^.. 


,+a^3+aQ where 


a i e 


GF ( 2 ) 


for 0 


< i 


< m- 1 . In 



the following discussion, the polynomial representation is 
used to represent the finite field GF(2 m ) . 

Let A=a m _i 3 ™“ 1+ . . , + ai 3 "tciQ and B=bjjj_]_ 3*^” • • «+b]_ 3tt , Q be 
two elements in GF(2 m ) . Then A+B=S m _]_ 3^1+ . . . + S;l 3+S 0 , 
where S^=ai+b^ (mod 2) for 0 < i < m-1. Therefore addition 
in GF(2^) is realized easily by m independent Exclusive-OR 
gates . 

Suppose P=Pm-l 3™“ .+Pi 3+Po is the product of A and B, 
i.e., P=AB. Then P can be written as follows: 
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m-1 m-1 m-1 




n=0 n=0 



where a n (^) i s the coefficient of 8 n in i.e., = 

a m _i (^ ) gm-l+ . . # + a ^ (k ) g+ aQ ( k ) for 0 < k < m-1. From equation 

(1) we obtain p n =a n^ 0 ^0 +a n^ ^ ^1 + • • * +a n ^ m ” 2 ^m-2 +a n^ m ” 1 ^ b m-l • 
The computation of Agk can be performed recursively on k 
for 0 < k < m-1. Initially for k=0, A&0=A, i.e., a n (0)=a n 
for 0 < n < m-1. For 1 < k < m-1. 



m-1 



A 8 = 



(A8 k_1 ) 8 




(k-1) n+1 
a n e 



n=0 



( 2 ) 



= a 



m-1 



(k-1 ) g m 



m-1 




n-1 



(k-1) n 



n=l 



Substituting 8 ra= fm-l 3 m “3+ • • . + f l 8 + f o into equation (2), 
yields 



m-1 



A 8 = 




(k-1) 



n-1 



+a 



m-1 



(k-D n 
n p 



+ a 



(k-1) 



m-1 



fO (3)’ 



n=l 
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From equation (3), we obtain 



a n (k) = a n _i (^-1 ) + a m _i( k “l)f n for 1 < n < m-1 

a 0 (k) = a m-l (k " 1)f 0 

Table V indicates the step-by-step procedure for comput- 
ing P=AB+C in GF(24). In Table V a n (^), b n , c n , f n , and 
p n are the n-th bits of Ag k , B, C, F, and P, respectively, 
where F is the primitive polynomial and p n (i) is the partial 
sum of p n . 

Figure 5.3 depicts the systolic multiplier for our given 
finite field. The primitive polynomial is F=f 3 g3+f2 g2 + g+ 
fg . Input d n receives the bit b n of B. The n-th bits c n , 
a n and f n , of C, A, and F, respectively, are received 
serially at inputs eg, gg , and hg . Two control signals, 
START (0001 ) and END (0111) are used in the design with 
inputs rg and tg receiving the signals, respectively. 

Output e 4 serially transmits the n-th bit, p n , of the 
result P out of the system. The order of the inputs and the 
outputs is also shown in Figure 5.3. The flip-flops (FF) 
associated with inputs tg and hg are used for the purpose of 
synchronization . 

The circuit of cell is shown in Figure 5.4. The 

operation of the flip-flops in the system is synchronized 
implicitly by a clock signal. When ri*=l, u-L= gi * a t the 
next time unit (through switch SW) . Additionally, when 
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TABLE V 



COMPUTATION OF P = AB + C IN GF ( 2 4 ) 



STEP 

NUMBER OPERATIONS 




1 p 3 (0 ) = c 3 


a 3 (0 ) = a 3 


p 3 (l) = p 3 (0 ) + a 3 (0 )bg , 
P2 (0 > = c? 


a 2 (0 ) = a 2 


P2 (1 ) = P2 (0 )+a 2 (0 )b 0 / 
3 

Pi (0 ) = Cl 


ai (0 ) = ai 

a 3 ( 1 ) = a 9 (0 )+a,(0 )f 3 


p 3 (2 ) = p 3 (0 ) + a 3 ( 1 )bi , 
4 p-^(l) = p-^ (0 (0 ) bQ , 

PO (° ) = c n 


a 0 (0 ) = a 0 

a 2 ( 1 ) = ai(0)+a 3 (0)f 2 


p 2 ( 2 ) = p 2 ^ 1 J + a 2 ^ )b i , 
PO (1 ) = PO (0 J + ao (0 )b 0 , 


ai(l) = ag (0 )+a 3 (0 ) fi 

a 3 (2 ) = a 2 (l)+a 3 (l)fi 


p 3 (3) = p 2 (2 )+a 3 (2 )b 2 , 
Pi (2) = pi ( 1 )+ai (l)bi , 


ag ( 4 ) = a 3 (0 ) fg 

a? (2 ) = ai (l)+a 3 U)f 2 


p 2 (3) = p 2 (2 ) + a 2 ( 2 )b 2 , 

Po( 2 ) = Pf) (1 )+ao ( ^ b i / 


ai(2) = ag ( 1 )+a 3 ( 1 ) f i 
a 3 (3) = a 2 ( 2 ) + a^ ( 2 ) 


p 3 = p 3 (4) = p 3 (3 ) + a 3 (3 )b 3 , 
8 pi (3) = P1 (2) + ai (2) h?i 


ag ( 2 ) = a 3 (1 )fg 
a ? ( 3 ) = ai ( 2 )+a 3 ( 2 ) f 2 


p 2 = p 2 (4) = p 2 (3 )+a 2 (3 )b 3 , 
PO (3 } = PO ^ +a O ^ b 2 / 


ai(3) = ag ( 2 ) + a 3 ( 2 ) fi 


10 Pi = Pi ( 4 ) = pi (3 )+ai ( 3 )b 3 , 


ag ( 3 ) = a 3 ( 2 ) f g 


11 p 0 = pg(4) = pq (3 ) + a Q (3 )b3 , 
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Figure 5.3 A Systolic Multiplier for the Finite Field GF(2 
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Figure 5.4 The Circuit of the Cell 







Two principle operations of 



ri*=0, ui retains its value, 
the system are the following: 

e i+l < (gi*di)©ei* 

9 i+l * < (mhi*) © (9i*ti*) 

where 0 < i <3, @ denotes Exclusive-OR operation, i.e., 

modulo-2 addition, and the backwards arrow denotes the sub- 
stitution operation. 

A comparison of the procedure in Table V and the 
structure in Figures 5.3 and 5.4 yields the following facts: 
The signal u^ in is equal to 33 ( 1 ) in Agi. The signal 

gi* is equal to a n (i) in Agi for some n. The signal e^* is 
equal to the partial sum AB+C. 

The multiplier in Figure 5.3 can be generalized to the 
finite field GF(2 ra ) by simply concatenating ra identical 
cells. Furthermore, additional registers and control sig- 
nals would be required if the b^'s are fed serially into the 
system in the same manner as the a^'s. [Ref. 7] 
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VI. IMPLEMENTATION 



A. BINARY ENCODER 

In this section we discuss the encoding process for a 
binary code and utilize a primitive shift register design to 
implement both a single-error-correcting binary encoder and 
a double-error-correcting binary encoder. 

1 . Encoding Process 

As discussed in Chapter V, an (n,k) code can be 
generated with a polynomial of degree n-k. If the poly- 
nomial is primitive of degree r and n=2 r -l, the code can be 
encoded and decoded with primitive shift registers. Hence, 
we restrict our attention solely to the case of primitive 
polynomials . 

We illustrate this procedure by generating the 
(15,11) binary code using the primitive polynomial p(x)= 
1+x+x 4 . Here n-k =4 , r=4 , n=2 4 -l=15 , and k=ll . The encoding 
process for the 11-bit message 10101010101 proceeds as in 
the example below. 

Example of Encoding Process: 

Message = 10101010101 

1) Represent the message m(x )=1 +x 2 +x 4 + x6+x8+x10 

as a polynomial. 

2) Multiply m(x) by x n "^ x 4 m( x )=x 4 + x6 + x8 + x!0 + xl2 + x 14 

to shift the message 

digits to the far right. 
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r ( x ) =l+x+x 3 



3 ) Calculate the remainder 
when x^^mfx) is 
divided by p(x) . 



4) Form the code 

polynomial as the sum 
X n-Km(x )+r (x ) , a 
multiple of p(x). 



Code Word = 110110101010101 



c ( x )=l+x+x 3 + x 4 + x6 + x 8 
+xi^+xi2+ x i4 



Note that codewords in this code are formed as multiples of 
the primitive generating polynomial p(x). As p(x) is of 
degree r there are n-r=k information symbols which can be 
chosen freely and then r check symbols are chosen so that 
the resulting codeword satisfies this criteria, namely that 
the codewords are multiples of the generator polynomial. In 
other words, the check digits are the coefficients of the 
remainder r(x) upon division of x n " k m(x) by p(x) as shown 
below. 



x 4 +x+l 



xl°+x 8 +x 7 +x5+x 4 +x 3 +l 
14 12 1() 8 6 , 4 

14 , 11 10 

X + X + X 



12 11 8 
x + x + x 

12 9,8 

x + x + x 

x n + x 9 + x 6 

x U + x 8 + x 7 



9 8 7 6 4 

x +x +x +x +x 



x 9 + 



+ x 6 + x 5 



x 8 +x 7 + x 5 + x 4 



^ 5^ 4 

+ X + X 



x 7 +x 4 +x 3 
x 4 + x 3 

4 

x +x+l 



X +X+1 
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2 . Single- Error- Cor rec ting Binary Encoder 



By utilizing the previously discussed concepts, we 
may now describe the encoding process of the binary (15,11) 
code as implemented in a primitive shift register shown in 
Figure 6.1. By simply feeding in the message m(x) at the 
x 4 - stage we are able to simulate the effect of multiplying 
m(x) by x 4 . The switch remains in position 1 as m(x) is fed 
completely into the shift register. The shift register 
computes the remainder when x 4 m(x) is divided by p(x) as the 
shift register is in essence a division circuit. The 
register contents after the information bits have all been 
fed into the register is the remainder after division of the 
information polynomial by the generator polynomial p(x). In 
the example the remainder is 1101=l+x+x3. The switch is 
then changed to position 2 to allow the check digits to 
follow the message digits producing the coded output 
110110101010101 for the example given. [Ref. 10] 

3 . Double- Error- Correc ting Binary Encoder 

To design a double-error-correcting binary encoder 
to correct up to two errors, additional redundancy must be 
added. Since we are now concerned with correction of up to 
two errors the generator polynomial is the product of the 
two distinct minimal polynomials m(1)(x) and m(3)(x) as 
described in the previous chapter. Their product is the 
polynomial 1 +x 4 + x6 + x7 + x 8 . The implementation of the encoder 
is carried out in essentially the same manner as its 



56 



m (x) 




TIME (t) REGISTER CONTENTS OUTPUT 



t 


= 


0 


INITIAL 


0 


0 


0 


0 


— 


t 


= 


1 




1 


1 


0 


0 


1 


t 


= 


2 




0 


1 


1 


0 


0 


t 


= 


3 




1 


1 


1 


1 


1 


t 


= 


4 




1 


0 


1 


1 


0 


t 


= 


5 




0 


1 


0 


1 


1 


t 


= 


6 


SWITCH IN 


1 


1 


1 


0 


0 


t 


= 


7 


POSITION 1 


1 


0 


1 


1 


1 


t 


= 


8 




1 


0 


0 


1 


0 


t 


= 


9 




0 


1 


0 


0 


1 


t 


= 


10 




0 


0 


1 


0 


0 


t 


= 


11 




1 


1 


0 


1 


1 


t 


= 


12 




0 


1 


1 


0 


1 


t 


= 


13 


SWITCH IN 


0 


0 


1 


1 


0 


t 


= 


14 


POSITION 2 


0 


0 


0 


1 


1 


t 


— 


15 




0 


0 


0 


0 


1 



Figure 6.1 A Single-Error-Correcting Binary Encoder 
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single-error counterpart. The encoder is presented in 
Figure 6.2. Now n=15 and k=15-8=7 so that there are a 
smaller number of codewords (2^) in this more powerful code. 
As the error correcting capability of the code increases, 
the number of information bits correspondingly decreases. 



B. REED-SOLOMON ENCODER 

In this section we draw upon the work of Liu [Ref. 8] 
and our acquired knowledge of finite field theory and Reed- 
Solomon codes to produce an RS encoder. 

As discussed in Chapter IV, an RS codeword has (2 J -1) 
symbols each of which is J-bits wide. Of the (2 J -1) symbols 
there are (2^-l-2E) information symbols and 2E parity-check 
symbols, where E is the number of symbol-errors the RS code 
is able to correct. If we treat the (2^-l-2E) information 
symbols as the coefficients of the polynomial 



f (x) 



2 J -1-2E 




i=l 





. + f T x 

2 -1-2E 



2E 



then the 2E parity-check symbols can be obtained as the 
coefficients of the remainder of f(x)/g(x) where g(x) is the 
generator polynomial of the code. Usually, g(x) is defined 
as 



g (x) 



2E 

(x+e 1 ) 

i = l 



2E 




j=0 
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Figure 6.2 A Double-Error-Correcting Binary Encoder 



where g is a primitive element of the Galois field GF ( 2 J ) 
and gj's are the coefficients of g(x) with g 2E =l . 

A diagram of the RS encoder which generates the 
remainder of f(x)/g(x) is given in Figure 6.3. It is 
composed of 2E systolic array multipliers, 2E "exclusive-or" 
adders, and 2E shift registers. The coefficients of the 
generator polynomial g(x) are fed into their respective 
systolic multipliers where the finite field multiplication 
A*B occurs, as discussed in Chapter V. Upon completion the 
partial product is "exclusive-or ' ed " with the contents of C 
of the previous shift register and distributed down the line 
to the next shift register in a pipeline fashion. The 
switches are normally in the "ON" position until the last 
information symbol goes into the encoder. At this moment 
all the switches are turned to the "OFF" position and the 
encoder behaves like a long shift register. The output of 
the encoder is then taken from the output of the last shift 
register. [Ref. 8] 

C. BINARY DECODER 

In this section we discuss the decoding process and 
design a single-error-correcting binary decoder and a 
double-error-correcting binary decoder both of which can be 
used in conjunction with the binary encoders of Section A of 
this chapter. 
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Figure 6.3 The RS Encoder 



1 . Decoding Process 



The decoding process is, in general, much more 
complicated than the encoding process. Not only must we 
deal with the detection of errors but also with their 
correction. As a result, we must be able to design a 
decoder which simultaneously detects and corrects errors. 

Error detection is usually much easier than error 
correction. Recall that a code polynomial is a multiple of 
the generating polynomial p(x). In other words, the 
received polynomial u(x) will be a code polynomial if and 
only if the remainder upon division of u(x) by p(x) is zero, 
i.e., u(x) = 0 modulo p(x). An example is given in 
Table VI. The register contents after u(x) is fed com- 
pletely into the detecting division register will contain 
u(x) modulo p(x). If any of the register contents are 
nonzero, u(x) is not a valid codeword. Thus the shift 
register acts as an error detector by performing a division 
of u(x) by p(x). In fact, the nonzero contents not only 
indicate that an error has occurred in transmission, but 
those contents also indicate the error pattern needed to 
correct the error and the location of the error in the 
transmitted codeword. [Ref. 10] 

2 . Single-Error-Correcting Binary Decoder 

Because of the complexity of the decoding process, 
we will initially design an error detection register 
followed by its error correction counterpart and then 
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TABLE VI 



VERIFICATION OF THE CODE POLYNOMIAL 



x 10 + x 8 + x 7 +x 5 + x 4 + x 3 +l 

x 4 +x+l I x^ 4 +x 32 +x 38 +x 8 +x 8 +x 4 +x 3 +x+l 

14 11 8 

x + x + x 

12 11 8 
x + x + x 

x 12 + x 9 + x 8 

X n + X 9 + X 6 

x n + x 8 + x 7 

x 9 +x 8 +x 7 +x 6 +x 4 
9 6 , 5 

X X +x 

x 8 + x 7 + x 5 + x 4 + x 3 
x 8 + x 5 + x 4 

X + X + X 

7 , 4 3 

x + x + x 



x 4 +x+l 
x 4 + x+ 1 
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synthesize them together to implement the complete decoder. 
To begin, we utilize the error detection register of Figure 
6.4. It is identical to the encoding register of Figure 6.1 
except that the received codeword is input to the decoder at 
the left end of the register. If the received word is 
110111101010101, then the nonzero contents 0110 after 
division indicate that an error has occurred in trans- 
mission. In order to correct the received word we need to 
know the error position. 

The received word can be viewed as a polynomial u(x) 
which can be written as the sum of the code polynomial c(x) 
and an error polynomial e(x), namely u(x) = c(x) + e(x). 
The error polynomial e(x) has ones in its error positions 
and zeros elsewhere, and addition is term by term modulo 2. 

Since the codewords c(x) are generated as multiples 
of the generator polynomial g(x) and since g is a root of 
g(x), the code polynomials evaluated at g are equal to zero, 
namely c(g) = 0. Thus u(g) = c(g) + e(g) = e(g). Since we 
assume in this sub-section that only single errors have 
occurred in transmission we can also assume that if an error 
occurs then e(x) is a power of x, say e(x) = xi for some i. 
Thus u( g) = e( g) = gi. 

In order to correct the error we need to compute 
u(g) which is called the syndrome of the received word and 
then find the specific value i for which u( g) = gi. The 
value i will indicate the error position. We need then only 
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TIME (t) 



REGISTER CONTENTS 



t = 0 
t = 1 
t = 2 
t = 3 
t = 4 
t = 5 
t = 6 
t = 7 
t = 8 
t = 9 
t = 10 
t = 11 
t = 12 
t = 13 
t = 14 
t = 15 



0 0 0 0 
10 0 0 
0 10 0 
10 10 
0 10 1 
0 110 
0 0 11 
0 10 1 
1110 
1111 
0 0 11 
0 10 1 
0 110 
0 0 11 
0 10 1 

0 110= ERROR 



Figure 6.4 The Error Detection Register 
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set c(x) = u(x) + xi to obtain the code polynomial c(x) a 
multiple of p(x) which is "nearest" to the received poly- 
nomial u(x). The primitive shift register facilitates this 
task because while it is computing u(x) modulo p(x) it also 
leaves the coefficients of u(g) = gi in the shift register. 
[Ref. 10] 

For example, in Figure 6.5 the primitive shift reg- 
ister computes u ( x ) =1 +x+x3+x4 + x5+x6 + x8 + x 1 C* + xJ. 2+ x 14 modulo 
p(x)=l+x+x4 and the syndrome is 0110 = x+x^. Note from 
Table I of Chapter III that 0110 is the 4-digit represen- 
tation of g5. Hence the error in the received polynomial 
occurs in the position of x^ . Therefore, the code poly- 
nomial is c(x)= u ( x )+e ( x )= ( 1 +x+x3+x4+x5+x6+x8+x10+x12+x 14 ) 
+ x^ = 1 +x+x3 + x4 + x6 + x8 + x1C) + x 12 + x 14 . The corrected codeword 
is 110110101010101 and the corrected information symbols are 
10101010101. The same procedure is also illustrated in 
Table VII by the actual long division process. 

We now examine the primitive shift register decoding 
process which performs the error correction. After the 
syndrome is computed by the primitive shift register 
division process, an additional primitive shift register of 
the same type can be used to correct the error without 
reference to a table of powers of the primitive element g. 
The correcting register shown in Figure 6.6 is basically the 
same primitive shift register used throughout this chapter 
with the exception that there are output lines leading from 



66 




67 



Figure 6.5 The Error Correction Register 




TABLE VII 



SYNDROME CALCULATION USING LONG DIVISION 



10 8 7 5 4 3 

X + X + X + X + X + X + X+1 

14. 12, 10. 8 



14 11 10 

x + x + x 



X +X+1 I X + X + X + X + X +X + X + X +X+1 

8 



12 L 11 , 

X + X + X 

x 12 + x 9 + x 8 

x n + x 9 + x 6 
x n + x 8 + x 7 



x 9 + x 8 +x 7 +x 6 +x 5 

x 9 + x 6 + x 5 

8 , 7 , 4 
x + x + x 

x 8 + x 5 + x 4 

x 7 + x 5 + x 3 

x 7 +x 4 + x 3 

5 4 

x +x + x 

5 ^ 2 , 

X +x +x 



x 4 +x 2 +l 

x 4 +x +1 
2 

X +x 



SYNDROME: x+x 2 =0110 = B 5 

1 B B 2 B 3 g 4 g 5 B 6 g 7 g 8 g 9 g 1Q B 11 g 12 B 13 g 14 

110111101010101 = ERROR 

CODEWORD 

110110101010101 = CORRECTED 

CODEWORD 
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Figure 6.6 The Initial Single-Error-Correcting Binary Decoder 
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each of the four registers. If the correcting register 
is set initially at 0100, the 4-digit representation for 
the element 0 , then, as it shifts, the output is the same 
cycle as the 4-digit representation listing of the 
gi ( i=l , 2 , 3 , . . . , 15 ) in Table I since a shift in the primitive 
shift register is the same as multiplication by g. No 
matter which state the register is set to initially the 
correcting register will output elements of that maximum- 
cycle in the same cycle order as long as the register 
continues to shift. If the register is set at gi, it will 
be in state gi + j after j shifts. [Ref. 10] 

Figure 6.7 (the complete single-error-correcting 
binary decoder) shows the received word of our example, 
namely 110111101010101 whose polynomial form is l+x+x^+x^+x^ 
+x6+x8+x!0+xl2+ x 14 i n a storage register and the syndrome 
0110 in the correcting register. From our previous dis- 
cussion we know that the error occurs in U 5 . Thus, if the 
detector register has output 1 as U 5 leaves the storage 
register and 0 otherwise, the word 110111101010101 will be 
corrected after fifteen shifts to read 110110101010101. We 
illustrate how the correcting register is used to accomplish 
this task by listing the new states of the correcting 
register, and the outputs from the storage register and 
decoder after each shift. The states are depicted in 
Table VIII. 
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Figure 6.7 The Complete Single-Error-Correcting Binary Decoder 
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NOTE: Register U 5 is corrected from its initial error state 1 to its corrected 

sta te 0 . 



Note from Table VIII that the incorrect digit U5 
leaves the storage register when the correcting register is 
in state 1000. If the detector is made to produce an output 
1 when it detects 1000 and 0 otherwise, then U 5 will be 
properly corrected. In general, if the syndrome is 0^, then 
the error occurs in the coefficient of xi, namely u^, where 
the received polynomial has the form 



u(x) 



n-1 

= ujxi 



i=0 



If j is such that u n _j = ui, then ui leaves the storage 
register when the correcting register is in state 0i + j as 
shown below: 



State 

fti+1 



( i+2 



Output 

u n-l 

u n-2 



0i+j 



u n- j =u i 



Since gi + j=0 n =l, the detector will correct the digit u^ and 
the received word will be corrected to the nearest code word 
after the decoder completes this process. [Ref. 10 ] 
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To recapitulate the correction process, the detect- 
ing register computes the syndrome of the received word. As 
each digit of u(x) enters the detecting register it simul- 
taneously enters the storage register. When the syndrome is 
determined, it is transferred to the correcting register for 
the error-correcting procedure just described. 

3 . Double-Error-Correcting Binary Decoder 

To implement a double-error-correcting binary 
decoder we begin with a general analysis of the three stages 
that comprise decoding. The first stage is the Syndrome 
Generator stage. The syndrome is defined as the nonzero 
remainder of the received polynomial when it is divided by 
the given primitive shift register. The second stage or the 
Central Galois Field Processor finds the error locator 
polynomial a(z) (usually accomplished by using Berlekamp's 
iterative algorithm or Massey's linear feedback shift 
register synthesis algorithm) . At this stage the polynomial 
is determined which defines the location of the errors that 
have occurred in transmission. Finally, the third stage or 
the Chien Searcher stage finds the roots of a(z) to deter- 
mine which digits should be corrected. Note, in the binary 
code, correction is trivial when the location of the errors 
is determined, i.e., the bit in error need only be 
complemented. [Ref. 11] 

Using our previous double-error-correcting generator 
polynomial l+x^+x^+x^+x^ , which is the product of 
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( l+x+x^ ) ( 1+x+x 2 +x3+x4 ) , we are able to produce Stage I of 
the decoding process as illustrated by the division process 
in Figure 6.8. Similarly, we are also able to produce 
Stages II and III (Figures 6.9 and 6.10, respectively) along 
with a block diagram of the complete decoder in Figure 
6 . 11 . 

The operation of the decoder is relatively straight 
forward as in the previous section. Utilizing a buffer 
capable of storing 2n digits, the Chien Searcher is in the 
process of computing a(z) in order to determine whether or 
not the next digit to leave the buffer should be corrected. 
The Syndrome Generator at the same time computes the 
syndrome of the received word while the Central Galois Field 
Processor finds the error- locator polynomial for the 
buffered word. Once the coefficients of the error- locator 
polynomial are read out of the Central Galois Field 
Processor and into the Chien Searcher, the syndrome or the 
nonzero remainder of the next block of received words is 
read back into the Central Galois Field Processor for con- 
tinual operation. See [Ref. 5] for further details of the 
multiple error correction process. 

If the Central Galois Field Processor operates so 
fast that it is able to compute the error location before 
all of the new received word arrives, then the buffer size 
may be reduced. In general, the buffer is made big enough 
to accommodate the expected worst case for the time to 
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Figure 6.9 Stage II: The Central Galois Field Processor 




Figure 6.10. Stage III: The Chien Searcher 



78 







H 



W 

o 

< 

C/} 



H 

H 

O 

< 

Eh 

m 



H 

H 

a 

< 

Eh 

CA 



79 



Figure 6.11 The Complete Double-Error-Correcting Binary Decoder 





compute the locations of the two errors. However, for 
example, suppose that the Central Galois Field Processor is 
able to compute the error locator in half the time required 
for n digits to be received from the channel. In that case, 
the buffer need only be capable of storing 3n/2 digits. 
After a complete word is received, the central processor 
computes its error location by the time the beginning of 
this word is ready to leave the buffer. The error locator 
is then fed into the Chien Searcher, and the central pro- 
cessor sits idle until the rest of the incoming word is 
received. See [Ref. 5] for details. 

Although the above discussion pertains strictly to a 
binary decoder capable of correcting two errors, it can be 
generalized to correct t or fewer errors. By expanding the 
hardware in Stages I and III to accommodate the additional 
shift register size required by t distinct minimal poly- 
nomials, we are able to implement the decoder with approxi- 
mately the same effectiveness. Likewise, the same procedure 
of utilizing the product of t distinct minimal polynomials 
would also be used in the design of a multiple- 
error-correction binary encoder. 

D. REED-SOLOMON DECODER 

As with any multiple-error detection and correction 
process, the decoding of RS codes is very complex. As a 
result, the known decoding procedures as discussed by Liu 
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[Ref. 12] will be presented in this section to obtain a re- 
petitive and recursive technique which is suitable for sys- 
tolic array development and eventual VLSI implementation. 

Recall that the information symbols of an RS code are 
treated as the coefficients of the polynomial f(x). If we 
let 

f(x) = fQ + f ].x+ . . . + fN-lx N- l 

be the transmitted code vector (where N = codeword length), 
and let 



r(x) = ro+rix+ . . .+rN_ix N ”l 

be the received code vector over a noisy channel, then the 
error pattern added by the channel is 

e(x) = r(x)-f(x) = eo+eix+ . . ,+eN-ix N_1 . 

The first step of the decoding procedure is to store the 
received code vector rj into the buffer register and then 
compute the syndrome using the equation 

N— 1 

(5) 

j=0 

where 0 < i < 2E-1 . Since rj = fj+ej, equation (5) can be 
expressed as 
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N-l 

’*■ -2_<v 

j=0 



^ + e j ) 3 



(k+i)j 



N-l N-l 

-/ f i6 (k+i) J + V < k+1) 

j^o" jW 



- F k+ i + E k+ i 



( 6 ) 



In the above equation 



k+ 



N-l 

i-zl £ i 8<k+i)j 



j=0 



(7) 



and 



J k+ 



N-l 

. = / e . g 

i Z j 

j=o 



( k+ i ) j 



(3) 



Note that in equation (8) E] c+J [ is the finite field transform 
of the ej ' s. 

The second step of the decoding procedure is to compute 
a SL for 1 < i < v (where v = number of errors) using the 

equation 



S. 

l 




for 0 < i < 2E-1 
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from the syndromes computed in the previous step. This 
can be accomplished using Berlekamp's iterative algorithm or 
Massey's linear feedback shift register ( LFSR) synthesis 
algorithm. [Ref. 12] 

Upon obtaining the o^'s, the third step of the decoding 
procedure is to use the recursive equation 






where 




for 2E < i < N-l 



E k+ i = E k+ i- N for k+i > N 

to compute the remaining ^ for 2E < i < N-l. 

After determining the transform of the error pattern 
E]^ + for 0 < k+ i < N-l, by equation (8), we can then apply 

the inverse transform to E^+i, to obtain the error pattern 
ej, i • e • , 

N-l 

e j = (Nfl 21 1 w r(k+i)3 

k+i=0 

for j=0 ,1 ,2 , . . . ,N-1 . Then the corrected codeword is 
obtained by subtracting the error pattern ej from the stored 
code vector rj in the buffer register. 
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